Guia de Programação CUDA: Fundamentos do Desenvolvimento de Kernels CUDA

O desenvolvimento de kernels CUDA começa com a definição de um kernel, que é uma função especializada em C++ projetada para executar em paralelo em grande número de núcleos de um GPU da NVIDIA. Essas funções representam a unidade fundamental de trabalho no modelo de programação CUDA, atuando como ponte onde a lógica serial do host se transforma em execução massivamente paralela no dispositivo.

1. O Especificador global

O __global__ especificador de declaração é um qualificador de API obrigatório que instrui o compilador a gerar código para o GPU enquanto mantém o ponto de entrada da função visível para o CPU. Funções que são executadas no GPU e podem ser invocadas pelo host são chamadas de kernels.

2. Ambiente de Execução

Kernels são enviados para e executados nos Multiprocessadores de Streaming (SMs). O SM é o principal motor computacional dentro de um GPU da NVIDIA responsável por gerenciar centenas de threads concorrentes. Cada SM gerencia blocos de threads e os agenda nos núcleos de processamento.

Regra de Sintaxe: Kernels devem retornar estritamente void. Como operam de forma assíncrona em relação ao host, eles não podem retornar um valor diretamente para o CPU; devem gravar os resultados de volta na memória alocada no dispositivo.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.

1. O Especificador __global__

2. Ambiente de Execução

1. O Especificador global